Weighted Double Q-learning
نویسندگان
چکیده
Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Qlearning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.
منابع مشابه
Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments
Despite single agent deep reinforcement learning has achieved significant success due to the experience replay mechanism, Concerns should be reconsidered in multiagent environments. This work focus on the stochastic cooperative environment. We apply a specific adaptation to one recently proposed weighted double estimator and propose a multiagent deep reinforcement learning framework, named Weig...
متن کاملA Proposal of Weighted Q-learning for Continuous State and Action Spaces
A kind of weighted Q-Learning algorithm suitable for control systems with continuous state and action spaces was proposed. The hidden layer of RBF network was designed dynamically by virtue of the proposed modified growing neural gas algorithm so as to realize the adaptive understanding of the continuous state space. Based on the standard Q-Learning implemented by RBF network, the weighted Q-Le...
متن کاملDouble Q-learning
In some stochastic environments the well-known reinforcement learning algorithm Q-learning performs very poorly. This poor performance is caused by large overestimations of action values. These overestimations result from a positive bias that is introduced because Q-learning uses the maximum action value as an approximation for the maximum expected action value. We introduce an alternative way ...
متن کاملDouble Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms
Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...
متن کاملExpected masses of merging compact object binaries observed in gravitational waves
We use the well tested StarTrack binary population synthesis code to examine the properties of the population of compact object binaries. We calculate the distribution of masses and mass ratios taking into account weights introduced by observability in gravitational waves during inspiral. We find that in the observability weighted distribution of double neutron star binaries there are two peaks...
متن کامل